Annotation of Nuggets and Relevance in GALE Distillation Evaluation
نویسنده
چکیده
This paper presents an approach to annotation that BAE Systems has employed in the DARPA GALE Phase 2 Distillation evaluation. The purpose of the GALE Distillation evaluation is to quantify the amount of relevant and non-redundant information a distillation engine is able to produce in response to a specific, formatted query; and to compare that amount of information to the amount of information gathered by a bilingual human using commonly available state-of-the-art tools. As part of the evaluation, following NIST evaluation methodology of complex question answering (Voorhees, 2003), human annotators were asked to establish the relevancy of responses as well as the presence of atomic facts or information units, called nuggets of information. This paper discusses various challenges to the annotation of nuggets, called nuggetization, which include interaction between the granularity of nuggets and relevancy of these nuggets to the query in question. The approach proposed in the paper views nuggetization as a procedural task and allows annotators to revisit nuggetization based on the requirements imposed by the relevancy guidelines defined with a specific end-user in mind. This approach is shown in the paper to produce consistent annotations with high inter-annotator agreement scores. 1 This material is based upon work supported by the Defense Advanced Research Projects Agency DARPA/IPTO, Global Autonomous Language Exploitation, contract #HR0011-06-C-003. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency or the U.S. Government.
منابع مشابه
Identifying Nuggets of Information in GALE Distillation Evaluation
This paper describes an approach to automatic nuggetization and implemented system employed in GALE Distillation evaluation to measure the information content of text returned in response to an open-ended question. The system identifies nuggets, or atomic units of information, categorizes them according to their semantic type, and selects different types of nuggets depending on the type of the ...
متن کاملStatistical Evaluation of Information Distillation Systems
We describe a methodology for evaluating the statistical performance of information distillation systems and apply it to a simple illustrative example. (An information distiller provides written English responses to English queries based on automated searches/transcriptions/translations of English and foreign-language sources. The sources include written documents and sound tracks.) The evaluat...
متن کاملEvaluation of Document Citations in Phase 2 Gale Distillation
The focus of information retrieval evaluations, such as NIST’s TREC evaluations (e.g. Voorhees 2003), is on evaluation of the information content of system responses. On the other hand, retrieval tasks usually involve two different dimensions: reporting relevant information and providing sources of information, including corroborating evidence and alternative documents. Under the DARPA Global A...
متن کاملInvestigation of the Effects of Methylcellulose and Carrageenan Use on Textural, Physicochemical and Sensory Characteristics of Chicken Nuggets
Backgrounds and Objectives: Chicken-based fried meat products are one of the most widely consumed ready-to-eat food products. Considering the importance of sensory and textural characteristics of these products in their acceptance by the consumers and effects of hydrocolloids on these characteristics, the overall purpose of this study was to investigate effects of methylcellulose and carrageena...
متن کاملA Semi-Automatic Evaluation Scheme: Automated Nuggetization for Manual Annotation
In this paper we describe automatic information nuggetization and its application to text comparison. More specifically, we take a close look at how machine-generated nuggets can be used to create evaluation material. A semiautomatic annotation scheme is designed to produce gold-standard data with exceptionally high inter-human agreement.
متن کامل